Deep Denoising Auto-encoder for Statistical Speech Synthesis

نویسندگان

Zhenzhou Wu

Shinji Takaki

Junichi Yamagishi

چکیده

This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis. The technique allows us to automatically extract low-dimensional features from high dimensional spectral features in a non-linear, data-driven, unsupervised way. We compared the new stochastic feature extractor with conventional mel-cepstral analysis in analysis-by-synthesis and text-to-speech experiments. Our results confirm that the proposed method increases the quality of synthetic speech in both experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Parametric Speech Synthesis Using Bottleneck Representation From Sequence Auto-encoder

In this paper, we describe a statistical parametric speech synthesis approach with unit-level acoustic representation. In conventional deep neural network based speech synthesis, the input text features are repeated for the entire duration of phoneme for mapping text and speech parameters. This mapping is learnt at the frame-level which is the de-facto acoustic representation. However much of t...

متن کامل

Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement

Speech Enhancement is a challenging and important area of research due to the many applications that depend on improved signal quality. It is a pre-processing step of speech processing systems and used for perceptually improving quality of speech for humans. With recent advances in Deep Neural Networks (DNN), deep Denoising Auto-Encoders have proved to be very successful for speech enhancement....

متن کامل

Speech enhancement with weighted denoising auto-encoder

A novel speech enhancement method with Weighted Denoising Auto-encoder (WDA) is proposed in this paper. A weighted reconstruction loss function is introduced to the conventional Denoising Auto-encoder (DA), and makes it suitable for the task of speech enhancement. First, the proposed WDA is used to model the relationship between the noisy and clean power spectrums of speech signal. Then, the es...

متن کامل

Pattern Recognition: Invariance Learning in Convolutional Auto Encoder Network

The ability of the human visual processing system to accommodate and retain clear understanding or identification of patterns irrespective of their orientations is quite remarkable. Conversely, pattern invariance, a common problem in intelligent recognition systems is not one that can be overemphasized; obviously, one‘s definition of an intelligent system broadens considering the large variabil...

متن کامل

Gradual training of deep denoising auto encoders

Stacked denoising auto encoders (DAEs) are well known to learn useful deep representations, which can be used to improve supervised training by initializing a deep network. We investigate a training scheme of a deep DAE, where DAE layers are gradually added and keep adapting as additional layers are added. We show that in the regime of mid-sized datasets, this gradual training provides a small ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1506.05268 شماره

صفحات -

تاریخ انتشار 2015

Deep Denoising Auto-encoder for Statistical Speech Synthesis

نویسندگان

چکیده

منابع مشابه

Statistical Parametric Speech Synthesis Using Bottleneck Representation From Sequence Auto-encoder

Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement

Speech enhancement with weighted denoising auto-encoder

Pattern Recognition: Invariance Learning in Convolutional Auto Encoder Network

Gradual training of deep denoising auto encoders

عنوان ژورنال:

اشتراک گذاری